{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copyright 2025 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Instruction Following Eval Recipe**\n",
    "\n",
    "This Eval Recipe demonstrates how to compare performance of two models on a Instruction Following dataset using [Vertex AI Evaluation Service](https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluation-overview)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<table align=\"left\">\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini-2-spatial-reasoning&utm_medium=aRT-clicks&utm_campaign=gemini-2-spatial-reasoning&destination=gemini-2-spatial-reasoning&url=https%3A%2F%2Fcolab.research.google.com%2Fgithub%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fblob%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fmodel_upgrades%2Finstruction_following%2Fvertex_colab%2Finstruction_following_eval.ipynb\">\n",
    "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini-2-spatial-reasoning&utm_medium=aRT-clicks&utm_campaign=gemini-2-spatial-reasoning&destination=gemini-2-spatial-reasoning&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fcolab%2Fimport%2Fhttps%3A%252F%252Fraw.githubusercontent.com%252FGoogleCloudPlatform%252Fapplied-ai-engineering-samples%252Fmain%252Fgenai-on-vertex-ai%252Fgemini%252Fmodel_upgrades%252Finstruction_following%252Fvertex_colab%252Finstruction_following_eval.ipynb\">\n",
    "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://art-analytics.appspot.com/r.html?uaid=G-FHXEFWTT4E&utm_source=aRT-gemini-2-spatial-reasoning&utm_medium=aRT-clicks&utm_campaign=gemini-2-spatial-reasoning&destination=gemini-2-spatial-reasoning&url=https%3A%2F%2Fconsole.cloud.google.com%2Fvertex-ai%2Fworkbench%2Fdeploy-notebook%3Fdownload_url%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fapplied-ai-engineering-samples%2Fmain%2Fgenai-on-vertex-ai%2Fgemini%2Fmodel_upgrades%2Finstruction_following%2Fvertex_colab%2Finstruction_following_eval.ipynb\">\n",
    "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
    "    </a>\n",
    "  </td>\n",
    "  <td style=\"text-align: center\">\n",
    "    <a href=\"https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/gemini/model_upgrades/instruction_following/vertex_colab/instruction_following_eval.ipynb\">\n",
    "      <img width=\"32px\" src=\"https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
    "    </a>\n",
    "  </td>\n",
    "</table>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Use case: Instruction Following \n",
    "\n",
    "- Metric: This eval uses a Pairwise Instruction Following template to evaluate the responses and pick a model as the winner.\n",
    "\n",
    "- Evaluation Dataset is based on [Instruction Following Evaluation Dataset](https://github.com/google-research/google-research/blob/master/instruction_following_eval/data/input_data.jsonl). It includes 10 randomly sampled prompts in a JSONL file `dataset.jsonl` with the following structure:\n",
    "    - `prompt`: The task with specific instructions provided\n",
    "\n",
    "- Prompt Template is a zero-shot prompt located in [`prompt_template.txt`](./prompt_template.txt) with one prompt variable (`prompt`) that is automatically populated from our dataset.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configure Eval Settings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile .env\n",
    "PROJECT_ID=your-project-id          # Google Cloud Project ID\n",
    "LOCATION=us-central1                  # Region for all required Google Cloud services\n",
    "EXPERIMENT_NAME=instructionfollowing-eval-recipe-demo      # Creates Vertex AI Experiment to track the eval runs\n",
    "MODEL_BASELINE=gemini-1.5-flash  # Name of your current model\n",
    "MODEL_CANDIDATE=gemini-2.0-flash # This model will be compared to the baseline model\n",
    "DATASET_URI=\"gs://gemini_assets/instruction_following/dataset.jsonl\"  # Evaluation dataset in Google Cloud Storage\n",
    "PROMPT_TEMPLATE_URI=\"gs://gemini_assets/instruction_following/prompt_template.txt\"  # Text file in Google Cloud Storage\n",
    "METRIC_NAME = \"pairwise_instruction_following\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install Python Libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install --upgrade --quiet google-cloud-aiplatform[evaluation] python-dotenv\n",
    "# The error \"session crashed\" is expected. Please ignore it and proceed to the next cell.\n",
    "import IPython\n",
    "IPython.Application.instance().kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import json\n",
    "import pandas as pd\n",
    "import sys\n",
    "import vertexai\n",
    "from dotenv import load_dotenv\n",
    "from google.cloud import storage\n",
    "\n",
    "from datetime import datetime\n",
    "from IPython.display import clear_output\n",
    "from vertexai.evaluation import EvalTask, EvalResult, PairwiseMetric,  MetricPromptTemplateExamples\n",
    "from vertexai.generative_models import GenerativeModel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Authenticate to Google Cloud (requires permission to open a popup window)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "load_dotenv(override=True)\n",
    "if os.getenv(\"PROJECT_ID\") == \"your-project-id\":\n",
    "    raise ValueError(\"Please configure your Google Cloud Project ID in the first cell.\")\n",
    "if \"google.colab\" in sys.modules:  \n",
    "    from google.colab import auth  \n",
    "    auth.authenticate_user()\n",
    "vertexai.init(project=os.getenv('PROJECT_ID'), location=os.getenv('LOCATION'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run the eval on both models on the Pairwise Autorater"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def load_file(gcs_uri: str) -> str:\n",
    "    blob = storage.Blob.from_string(gcs_uri, storage.Client())\n",
    "    return blob.download_as_string().decode('utf-8')\n",
    "\n",
    "def load_dataset(dataset_uri: str):\n",
    "    jsonl = load_file(dataset_uri)\n",
    "    samples = [json.loads(line) for line in jsonl.splitlines() if line.strip()]\n",
    "    df = pd.DataFrame(samples)\n",
    "    return df\n",
    "\n",
    "def load_prompt_template() -> str:\n",
    "    blob = storage.Blob.from_string(os.getenv(\"PROMPT_TEMPLATE_URI\"), storage.Client())\n",
    "    return blob.download_as_string().decode('utf-8')\n",
    "\n",
    "def run_eval(model: str) -> EvalResult:\n",
    "  timestamp = f\"{datetime.now().strftime('%b-%d-%H-%M-%S')}\".lower()\n",
    "  return EvalTask(\n",
    "      dataset=os.getenv(\"DATASET_URI\"),\n",
    "      metrics= [\n",
    "        PairwiseMetric(\n",
    "            metric=os.getenv('METRIC_NAME'),\n",
    "            metric_prompt_template=MetricPromptTemplateExamples.Pairwise.INSTRUCTION_FOLLOWING.metric_prompt_template,\n",
    "            # Baseline model for pairwise comparison\n",
    "            baseline_model=GenerativeModel(os.getenv(\"MODEL_BASELINE\")),\n",
    "        ),\n",
    "    ],\n",
    "      experiment=os.getenv('EXPERIMENT_NAME')\n",
    "  ).evaluate(\n",
    "      model=GenerativeModel(os.getenv(\"MODEL_CANDIDATE\")),\n",
    "      prompt_template=load_prompt_template(),\n",
    "      experiment_run_name=f\"{timestamp}-{model.replace('.', '-')}\"\n",
    "  )\n",
    "\n",
    "#baseline = run_eval(os.getenv(\"MODEL_BASELINE\"))\n",
    "metrics = run_eval(os.getenv(\"MODEL_CANDIDATE\"))\n",
    "clear_output()\n",
    "print(\"Baseline model win rate:\", round(metrics.summary_metrics[f'{os.getenv(\"METRIC_NAME\")}/baseline_model_win_rate'],3))\n",
    "print(\"Candidate model win rate:\", round(metrics.summary_metrics[f'{os.getenv(\"METRIC_NAME\")}/candidate_model_win_rate'],3))\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
